Capstone Projects - Pneumonia Detection

Project Discription - To detect Pneumonia we need to detect Inflammation of the lungs. In this project the goal is to build a pneumonia detection system, to locate the position of inflammation in CXR image. Automating Pneumonia screening in chest radiographs, providing affected area details through bounding box. Assist physicians to make better clinical decisions or even replace human judgement in certain functional areas of healthcare (eg, radiology). Guided by relevant clinical questions, powerful AI techniques can unlock clinically relevant information hidden in the massive amount of data, which in turn can assist clinical decision making

Objective of Project - The objective of this project is to build an algorithm to locate the position of inflammation in a medical image. The algorithm needs to locate lung opacities on chest radiographs automatically The objective of the project is, • Build an Object Detection Model • Use transfer learning to fine-tune a model. • Set the optimizers, loss functions, epochs, learning rate, batch size, checkpointing, early stopping etc.

Above graph show the count of patients with different class • No Lung Opacity / Not Normal – 11821 • Normal – 8851 • Lung Opacity – 9555

Above graph show the count of patients with different target
• Target 0 (No Pneumonia) – 20672 • Target 1 (Pneumonia) – 9555 We can also infer that the data set is imbalanced for Target 0 & 1.

Above Pair Plot help us to understand the positive and negative correlation between all attributes. As per the below plot, we are able to see few attributes have positive and negative correlation To understand the values of correlation we need to perform correlation coefficient

Above Heat map will help us to understand the values of correlation coefficient. • There is a slight positive co-relation between height and width in the dataset i.e., 0.6. • There is also a slight negative co-relation between height and y i.e., -0.65

We need to deep dive in co-relation, so we visualize Joint plot on height & width and height & Y. Regression line also show that there is positive and negative corelation respectively. Above graphs also shows the distribution of attributes.

Read the CXR images and Extract one image and process the DICOM information. It is observed that some useful information is available in the DICOM metadata with predictive values, for example: • Patient sex, Patient age, Modality, View position, Rows & Columns, Pixel Spacing • The actual image of the CXR report is present in the last element tagged as Pixel data which is of array format. • All the remaining tags or elements are metadata providing additional details

Above contour & joint plot graphs shows the position of Pneumonia in lungs and shows number of areas effected in on CXR image. In below graphs there are 2 area effected from pneumonia

Above graph shows the Patient's age proportion in the pneumonia detection. Orange colour graphs show the pneumonia patients, Maximum data lies in between 33 years to 59 years. This also gives inference that majority of young people are infected by pneumonia.

Above graph shows the Patient's Sex & View Position proportion in the pneumonia detection. Orange colour graphs show the pneumonia patients and blue shows normal cases, • Patient Sex Graph – Cases of Pneumonia are more in Male than Female as per dataset. • View Position Graph - Cases of Pneumonia detection though different angle position view of AP is more than view position of PA.

Visualize the Images

Showing some random dicom images of a patient who do not have Pnuemonia, however with class No Lung Opacity / Not Normal

Showing some random dicom images of a patients who do not have Pnuemonia, however with class Normal

Model Building

Load pneumonia locations

Table contains [filename : pneumonia location] pairs per row.

If a filename contains multiple pneumonia, the table contains multiple rows with the same filename but different pneumonia locations. If a filename contains no pneumonia it contains a single row with an empty pneumonia location. The code below loads the table and transforms it into a dictionary.

The dictionary uses the filename as key and a list of pneumonia locations in that filename as value. If a filename is not present in the dictionary it means that it contains no pneumonia.

Load image filenames

Data generator

The dataset is too large to fit into memory, so we need to create a generator that loads data on the fly.

The generator takes in some filenames, batch_size and other parameters.

The generator outputs a random batch of numpy images and numpy masks.

As per Model, we are getting loss: 0.4210 - accuracy: 0.9690 - mean_iou: 0.304 - val_loss: 0.4252 - val_accuracy: 0.9654 - val_mean_iou: 0.7245

Plot Accuracy / Loss

Predict test images

Build YOLO Model

Clone YOLOv3

Generate images and labels for training YOLOv3

Plot a sample train image and label

We should give the list of image paths to YOLO. two seperate list textfiles for training images and validation images.

Create test image and labels for YOLOv3

Plot a sample test Image

Prepare Configuration Files for Using YOLOv3

For training, we would download the pre-trained model weights(darknet53.conv.74) using following wget command. Author of darknet also uses this pre-trained weights in different fields of image recognition

Training YOLOv3

Use trainined YOLOv3 for test images

We are phasing some issue while showing the Yolov3 Model and implement on Test image, we will further do it and fine tune in next report and final submission

Build CNN Model for Classification of Pnemonia Patient

We have run below code to covert the images and labels into an array and to save the time for next code execution, we covert into .npy format and saved in local directry. Fron next time we can directly read the .npy from directory. This will save time from reading the array again and again.

Use SMOTE to handle Imbalanced Dataset

VGG16 Model

VGG16 with Smote Data

MobileNet Model

We have build the CNN Model and then used the Transfer learning Model that is VGG16, ResNet50 & MobileNet. We have also handled the imbalanced dataset through SMOTE and again run on all model and check the accuracy.

We will further finetune all models in Final report submission.